Mining Interesting Itemsets using Submodular Optimization
نویسنده
چکیده
We propose a novel technique to retrieve itemsets that best explain a transaction database by leveraging a simple probabilistic model. Our approach is the first to infer such interesting itemsets directly from the transaction database using submodular function optimization and in so doing avoids many of the pitfalls commonly present in frequent itemset mining algorithms. Our proposed approach is theoretically simple, straightforward to implement, trivially parallelizable and exhibits good performance as we demonstrate on both synthetic and real-world examples.
منابع مشابه
A New Approach for Mining Top-Rank-k Erasable Itemsets
Erasable itemset mining first introduced in 2009 is an interesting variation of pattern mining. The managers can use the erasable itemsets for planning production plan of the factory. Besides the problem of mining erasable itemsets, the problem of mining top-rank-k erasable itemsets is an interesting and practical problem. In this paper, we first propose a new structure, call dPID_List and two ...
متن کاملOn Mining Max Frequent Generalized Itemsets
A fundamental task of data mining is to mine frequent itemsets. Since the number of frequent itemsets may be large, a compact representation, namely the max frequent itemsets, has been introduced. On the other hand, the concept of generalized itemsets was proposed. Here, the items form a taxonomy. Although the transactional database only contains items in the leaf level of the taxonomy, a gener...
متن کاملEfficient Computation of Partial-Support for Mining Interesting Itemsets
Mining interesting itemsets is a popular topic in the data mining community. The objective of this problem is to mine all interesting itemsets, with respect to a given interestingness measure. While considerable efforts have being spent on justifying the various interestingness measures, the algorithms that mine them are not quite well-studied, except in the case support, which has resulted in ...
متن کاملDepth-First Non-Derivable Itemset Mining
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collecti...
متن کاملMining Frequent Itemsets Using Support Constraints
Interesting patterns often occur at varied levels of support. The classic association mining based on a uniform minimum support, such as Apriori, either misses interesting patterns of low support or suuers from the bottleneck of itemset generation. A better solution is to exploit support constraints, which specify what minimum support is required for what itemsets, so that only necessary itemse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014